Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example to produce SQL without Entity Framework Core! #1361

Merged
merged 1 commit into from
Oct 27, 2023

Conversation

bkoelman
Copy link
Member

Why

We've always claimed it's possible to use JsonApiDotNetCore without Entity Framework Core. Just implement your own resource service or repository, right?

There's an implementation for MongoDB using its LINQ provider and there's an example that takes a LINQ expression, compiles it, then executes it against an in-memory list.

But we never told you what it takes to translate complex JSON:API requests to SQL yourself. So let's put our money where our mouth is: this PR shows how it can be done!

What

This PR provides an implementation for most of the JsonApiDotNetCore features. It supports all JSON:API endpoints (including atomic operations) and query string parameters (both top-level and deeply nested), as well as custom resource definition callbacks.

DapperRepository implements IResourceRepository and uses Dapper to execute ADO.NET database queries and to materialize the returned result set into JSON:API resources. This example lets Entity Framework Core generate the database at startup (for convenience) but doesn't use it for serving requests. Information about the underlying database model (tables, columns, and foreign keys) is needed to produce SQL. This is provided by IDataModelService. For convenience again, FromEntitiesDataModelService obtains that information from the Entity Framework Core model at startup, but feel free to plug in something else.

At a high level, QueryLayer is translated into a tree of SqlTreeNode objects representing the SQL query. SqlQueryBuilder takes that as input and produces SQL text from it. It's mostly SQL-92 compliant and supports PostgreSQL, MySQL, and SQL Server. Adapting it to your own flavor should be straightforward.

For example, the following GET request:

/people
  ?include=ownedTodoItems.tags
  &filter=not(equals(firstName,'X'))
  &sort=-lastName
  &fields[people]=lastName
  &filter[ownedTodoItems]=not(equals(description,'Y'))
  &sort[ownedTodoItems]=count(tags),assignee.lastName
  &fields[todoItems]=description
  &filter[ownedTodoItems.tags]=not(equals(name,'Z'))
  &sort[ownedTodoItems.tags]=name,-id
  &fields[tags]=name

Gets translated by JsonApiDotNetCore (unchanged) into:

QueryLayer<Person>
{
  Include: ownedTodoItems.tags
  Filter: not(equals(firstName,'X'))
  Sort: -lastName
  Selection
  {
    FieldSelectors<Person>
    {
      lastName
      id
      ownedTodoItems: QueryLayer<TodoItem>
      {
        Filter: not(equals(description,'Y'))
        Sort: count(tags),assignee.lastName
        Selection
        {
          FieldSelectors<TodoItem>
          {
            description
            id
            tags: QueryLayer<Tag>
            {
              Filter: not(equals(name,'Z'))
              Sort: name,-id
              Selection
              {
                FieldSelectors<Tag>
                {
                  name
                  id
                }
              }
            }
          }
        }
      }
    }
  }
}

Then DapperRepository (with the help of SelectStatementBuilder) translates that into:

-- Executing SQL with parameters: @p3 = 'Z', @p2 = 'Y', @p1 = 'X'
SELECT t1."Id", t1."LastName", t7."Id", t7."Description", t7.Id00 AS Id, t7."Name"
FROM "People" AS t1
LEFT JOIN (
    SELECT t2."Id", t2."Description", t2."OwnerId", t4."LastName", t6."Id" AS Id00, t6."Name"
    FROM "TodoItems" AS t2
    LEFT JOIN "People" AS t4 ON t2."AssigneeId" = t4."Id"
    LEFT JOIN (
        SELECT t5."Id", t5."Name", t5."TodoItemId"
        FROM "Tags" AS t5
        WHERE NOT (t5."Name" = @p3)
    ) AS t6 ON t2."Id" = t6."TodoItemId"
    WHERE NOT (t2."Description" = @p2)
) AS t7 ON t1."Id" = t7."OwnerId"
WHERE (NOT (t1."FirstName" = @p1)) OR (t1."FirstName" IS NULL)
ORDER BY t1."LastName" DESC, (
    SELECT COUNT(*)
    FROM "Tags" AS t3
    WHERE t7."Id" = t3."TodoItemId"
), t7."LastName", t7."Name", t7.Id00 DESC

For less involved requests, simpler SQL is produced where possible. For example:

GET /todoItems?include=assignee,owner,tags HTTP/1.1

Produces the following SQL (no sub-queries):

SELECT t1."Id", t1."CreatedAt", t1."Description", t1."DurationInHours", t1."LastModifiedAt", t1."Priority", t2."Id", t2."FirstName", t2."LastName", t3."Id", t3."FirstName", t3."LastName", t4."Id", t4."Name"
FROM "TodoItems" AS t1
LEFT JOIN "People" AS t2 ON t1."AssigneeId" = t2."Id"
INNER JOIN "People" AS t3 ON t1."OwnerId" = t3."Id"
LEFT JOIN "Tags" AS t4 ON t1."Id" = t4."TodoItemId"
ORDER BY t1."Priority", t1."LastModifiedAt" DESC, t4."Id"

In the SQL above, the ordering on Priority and LastModifiedAt originates from a resource definition.

Limitations

First of all, this is not a mature, battle-tested, and optimized implementation. If you can, please use Entity Framework Core instead, because:

  • It produces more efficient SQL in many cases.
  • It provides a split-query mode to prevent cartesian explosion.
  • It performs way better due to pre-compiled projection shapers and query caching.
  • It uses advanced algorithms to push down into a sub-query and pull it out again at various stages.
  • It is covered by a massive amount of tests.

That said, if you're not too concerned about performance or absolute correctness (there are likely bugs; please report them via issues or PRs), you're welcome to try it out or use it as an inspiration to implement your own data access.

The following limitations apply:

  • No pagination. Surprisingly, this is insanely complicated and requires non-standard, vendor-specific SQL (JOIN LATERAL/OUTER APPLY, ROW_NUMBER() OVER (PARTITION BY...). I've spent a long time trying to pull it off but eventually gave up. I challenge you!
  • No many-to-many relationships. It requires additional information about the database model but should be possible to implement.
  • No resource inheritance. Requires additional information about the database and is complex to implement.
  • No composite primary/foreign keys. It could be implemented, but it's a corner case that few people use.
  • Only parameterless constructors in resource classes. This is because materialization is performed by Dapper, which doesn't support constructors with parameters.
  • Simple change detection in write operations. It includes scalar properties, but relationships go only one level deep. This is sufficient for JSON:API.
  • The database table/column/key name mapping is based on hardcoded conventions. This could be generalized but I didn't do so to keep it simple.
  • Cascading deletes are assumed to occur inside the database, which SQL Server does not support very well. This is a lot of work to implement.
  • No [EagerLoad] support. It could be done, but it's rarely used.
  • Untested with self-referencing resources and relationship cycles.
  • No support for IResourceDefinition.OnRegisterQueryableHandlersForQueryStringParameters(). Because no IQueryable is involved, it doesn't apply.

Implementation

At a high level, there are many similarities with how Entity Framework Core performs the translation to SQL. I often struggled to grasp patterns from its source code, so I inferred most using trial and error.

The tree of SQL nodes

In this example, all nodes derive from SqlTreeNode. Most of them are straightforward and don't require explanation.

All nodes are immutable, yet they expose members as read-only collections. This has two reasons:

  1. Insertion order is preserved in Dictionary<,> with a string key. This is not true with ImmutableDictionary<,>, because it relies on indeterministic String.GetHashCode(). We need to know the exact SQL in tests.
  2. Due to generic variance, derived types can expose a collection of more derived elements.

The abstract type TableSourceNode contains a list of ColumnNodes. Derived type TableNode represents a database table, while SelectNode represents a sub-query. ColumnNode is also abstract, with derived types ColumnInTableNode and ColumnInSelectNode. SelectNode contains a list of abstract SelectorNodes per table, with implementations ColumnSelectorNode (SELECT t1.Name), CountSelectorNode (SELECT COUNT(*)), and OneSelectorNode (SELECT 1).
ColumnSelectorNode points to a ColumnNode (optionally aliased), so it can be a column in a table or a sub-query.

These abstract columns in TableSourceNode don't occur in the produced SQL. They are used to trace references back to an underlying database column. When a sub-query joins multiple tables, duplicate column names will be aliased to make them uniquely referenceable. In the example request above, t7.Id00 DESC points to the selector t6."Id" AS Id00, which points to the selector t5."Id", which points to the Id column in the Tags table.

Another need for tracing references is that it's not always possible to remap in-place. A post-processor pulls stale references back into scope.

Joins and sub-queries

At a fundamental level, all tables are joined using LEFT JOIN. If the foreign key is defined at the left side of the JSON:API relationship and it's non-nullable, it gets optimized into an INNER JOIN, which is more efficient. This optimization is not applicable when joining with a nested QueryLayer. For example, todo-items without any tags must still be returned at /todoItems?include=tags.

Initially, I thought another exception was needed for has and count in filters (see dotnet/efcore#32103). Ultimately, it comes down to interpreting what "null safe" means, so I chose to follow the Entity Framework Core behavior.

It is generally safest to join every include (or nested QueryLayer) as a sub-query. But that makes the SQL harder to read and slower to execute. Depending on the nested query layer shape, the use of a sub-query can be optimized into a simple join against a table. Determining whether that optimization can be applied is non-trivial when pagination is supported. Entity Framework Core is very flexible: it employs several techniques to push the current query down into a sub-query and pull it out again at various stages while processing the input.

In this example, we determine upfront whether a sub-query is needed. Orderings from sub-queries without pagination only need to appear in the top-level query. That just leaves filtering, which may constrain the set of related resources. So, a sub-query is only produced if the nested query layer has a filter. Due to all the complexity in Entity Framework Core, we sometimes generate simpler SQL (because opportunities are missed, there an open issue for that).

Materialization

As mentioned earlier, Dapper is used to parse the result set into .NET objects. It scans the returned column names and starts a new object each time an Id property occurs. To make that work, we must feed the list of expected object types upfront. This is easy to determine from the requested includes.

We implement a Map method that Dapper calls with an array of all objects found in a single row. From there, we call ResourceFieldAttribute.SetValue repeatedly, while caching instances to preserve reference identity (which matches NoTrackingWithIdentityResolution). This is more flexible than the Entity Framework Core materializer, which requires everything to be ordered to support streaming. Therefore, you'll see Entity Framework Core often adds Id to orderings to achieve total ordering (with the downside of potentially not fully using an index). We're not doing that, which reduces pressure on the database server.

A downside of Dapper usage is that column names in the result set must match property names exactly, so we use a post-processing step to compensate. You can see this in the example request above, where we turn:

SELECT t1."Id", ..., t7.Id00

into:

SELECT t1."Id", ..., t7.Id00 AS Id

Resource/relationship updates

The tricky part is ensuring changes are sent in the right order so you won't hit a foreign key constraint violation. For example, updating a one-to-one relationship where the foreign key exists at the right side requires first updating a row in another table.

Some of our relationship updates are more efficient because we update/delete related records in one go using WHERE "Id" IN (...), instead of issuing a SQL statement per match. On the other hand, the dynamic contents of IN reduces the effectiveness of execution plan caching, so I'm not sure it matters much.

QUALITY CHECKLIST

@codecov
Copy link

codecov bot commented Oct 22, 2023

Codecov Report

Merging #1361 (697c295) into master (9830302) will increase coverage by 0.92%.
The diff coverage is 93.71%.

@@            Coverage Diff             @@
##           master    #1361      +/-   ##
==========================================
+ Coverage   89.73%   90.65%   +0.92%     
==========================================
  Files         268      342      +74     
  Lines        8795    11047    +2252     
  Branches     1523     1817     +294     
==========================================
+ Hits         7892    10015    +2123     
- Misses        607      679      +72     
- Partials      296      353      +57     
Files Coverage Δ
...pperExample/AtomicOperations/AmbientTransaction.cs 100.00% <100.00%> (ø)
.../DapperExample/Controllers/OperationsController.cs 100.00% <100.00%> (ø)
src/Examples/DapperExample/Data/RotatingList.cs 100.00% <100.00%> (ø)
src/Examples/DapperExample/Data/Seeder.cs 100.00% <100.00%> (ø)
...es/DapperExample/Definitions/TodoItemDefinition.cs 100.00% <100.00%> (ø)
...es/DapperExample/FromEntitiesNavigationResolver.cs 100.00% <100.00%> (ø)
...c/Examples/DapperExample/Models/AccountRecovery.cs 100.00% <100.00%> (ø)
src/Examples/DapperExample/Models/LoginAccount.cs 100.00% <100.00%> (ø)
src/Examples/DapperExample/Models/Person.cs 100.00% <100.00%> (ø)
src/Examples/DapperExample/Models/Tag.cs 100.00% <100.00%> (ø)
... and 70 more

... and 6 files with indirect coverage changes

@bkoelman bkoelman marked this pull request as ready for review October 22, 2023 22:13
@bkoelman bkoelman merged commit 83f5a67 into master Oct 27, 2023
16 checks passed
@bkoelman bkoelman deleted the dapper-example branch October 27, 2023 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant